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A UNIFIED DEVELOPMENT OF SEVERAL TECHNIQUES FOR 
THE REPRESENTATION OF RANDOM VECTORS 
AND DATA SETS 

By W. Thomas Bundick 
Langley Research Center 

SUMMARY 

Linear vector space theory is used to develop a general representation of a set of 
vectors by linear combinations of orthonormal vectors such that the mean squared error 
of the representation is minimized. The vectors to be represented may be deterministic 
or random and may be of any dimension. From the extremal properties of eigenvectors, 
the optimum orthonormal vectors are found to be the eigenvectors of an operator deter- 
mined by the correlation function or matrix of the represented vectors. It is also proven 
that the mean squared error in the representation and the second moment of the coeffi- 
cient vectors can be determined from the eigenvalues of the operator. An entropy func- 
tion defined on the orthonormal coordinate vectors is minimized by the representation, 
and its value also can be computed from the eigenvalues. 

The general representation is applied to several specific problems involving the 
use of the familiar Karhunen-Loeve expansion, principal component analysis, and empir- 
ical orthogonal functions. 


INTRODUCTION 

The representation of a function or vector by an infinite or finite sum of basis func- 
tions or vectors is a vital analytical tool in engineering, science, and mathematics. For 
example, the Karhunen-Loeve expansion of a random process in terms of an infinite series 
with uncorrelated coefficients has become widely used since its introduction by Karhunen 
in the 1940's, particularly in communication and radar detection theory. (See DiFranco 
and Rubin, ref. 1.) The approximation of a set of data vectors in terms of an incomplete 
set of basis vectors has been used by statisticians and others to reduce the dimensionality 
of the data set. A representation of this type called principal component analysis was 
introduced in the field of psychology in the early 1900’s (Hotelling, ref. 2). The same 
approximation technique under the name empirical orthogonal functions has recently found 
considerable use in meteorology to represent atmospheric temperature and water vapor 



profiles. This technique was proposed by Pomalaza (ref. 3) for use in inverting the data 
from a multisatellite-microwave-occultation experiment to obtain atmospheric tempera- 
ture and pressure profiles on a global basis. One of the many analytical techniques that 
has been investigated in feature extraction and pattern recognition studies is the trans- 
formation of a set of pattern vectors onto a set of coordinates which minimizes the 
entropy of the representation (Watanabe et al., ref. 4). 

Each of the representations discussed above can be employed in both finite and infi- 
nite dimensional spaces. An investigation of the theory of each of them reveals that they 
all share a common basis (viz. the extremal properties of quadratic forms and eigenval- 
ues) and, furthermore, all of them share certain common properties. 

In view of these commonalities it should be enlightening to have a unified, general, 
theoretical development of these representations and of their common properties. It is 
the purpose of this paper to present such a development in terms of linear vector space 
theory. 

First to be presented is the theoretical development of a representation of a set of 
vectors such that the mean squared error of the representation is minimized. This will 
be followed by an examination, including proofs, of certain important properties of the 
representation. Then the general representation will be specificized to several applica- 
tions of the theory including those discussed in the opening paragraph of the introduction. 
For those readers who desire further clarification, several illustrative examples of the 
techniques are included as appendix A. 


SYMBOLS AND NOTATION 


A amplitude constant 

aj a constant 

tib a CT-algebra of subsets of ft 

C field of complex numbers 

C N x K matrix of coefficients 

vector with components c n ^ 
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Fourier coefficient of vector ^or in representation of x n 


Euclidean space 


error vector in representation of x^ 


arbitrary vector in the space *1/ 


arbitrary vector in V 


entropy function 


j-fl 


arbitrary self-adjoint transformation 


number of orthonormal vectors in representation of x n 

self-adjoint transformation on ‘V defined by JLj£ k = E |z*Z.l£kj 
Hilbert space of square integrable functions 


dimensionality of the space ‘V 


nit’— v’— x average of vectors t n , v fl , and x n , respectively 
m x (0 average of functions x n (-) 


space of all N-tuples 


subspace of 7] and range of T 


number of vectors x„ 
— n 


number of linearly independent vectors among x n 


probability measure on dB 


power in kth harmonic 
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Pn 

R(s,t) 

s,t 

T 

T 


trms 

U 


“n 


J n 




V' 


probability of occurrence of random process x n 
autocorrelation function 

K-dimensional subspace of c y spanned by ^ 
real parameters 

a linear transformation on V onto °f\ defined by Tf = ^Tjf, . . .,T^f 
period of x(t) in example A3 

a linear transformation (functional) on y onto C defined by 



atmospheric temperature profiles, °C (see appendix A) 


root-mean-square error in representation of t n , °C 


5x3 matrix of row vectors u 

— n 


N-dimensional vector with components u n 


arbitrary real constant 


three-dimensional vectors defined by u = v - m 

J — n — n — v 


components of u defined by u n = ( i//^, 


infinite dimensional Hilbert space or M-dimensional unitary space contain- 


ing x n 


subspace of y and range of T* 


vector with N components Vq 


data vectors in three-dimensional Euclidean space 


components of v defined by vq = <4^ 
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X 


x 


x(.) 

x,y,z 


x n(’) 


y n 


yo 

z 


z(.) 

— n 
z n 
z O 
6 jk 

C 

e 


x k 




N x M matrix of row vectors x 

— n 

a random vector 
a stochastic process 
Cartesian coordinates 

a vector, either deterministic or random, in ‘V 

vector x n expressed as function of a parameter 

imaginary part of u n 

imaginary part of vq 

N x m matrix of row vectors z 

— n 

column vector with N components x n (*) - m x (-) 
vector x after removal of average: i.e., z = x - m 
real part of u n 

i 

real part of Vq 
K ronecker delta 

a random variable , 

error in representation of x n 
kth eigenvalue of operator L 
jth eigenvalue of operator K 


jth eigenvector of operator K 
E covariance matrix 
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Cr2 

V 


mean squared value of kth coordinate vector c^ 


variance of K coefficient vectors 


variance of projected data (example Al) 


an interval on the real line 


±0 


^k 


<M’) 


—o 


£ 

— M 

i^k 

*k<*> 

Cl 


N x M matrix of row vectors m^ 


kth orthonormal vector 


functional form of 


^k 


constant vector in representation of x n 


K x M matrix of eigenvectors 
M x M matrix of eigenvectors ^ 
kth eigenvector of operator L 
functional form of 

a nonempty set forming the basic space of the probability space (Cl,(B,(p) 


frequency corresponding to (see appendix A) 


Special mathematical notation: 


E in 


<!•£> 

f 

|b| 


statistical expected value of x n 


inner product of vectors f and £ 


norm of vector f defined as 
absolute value of a scalar b 


<t. t) 


1/2 


complex conjugate of f 
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T 

^ transpose of matrix & 

T* adjoint, or conjugate transpose, of operator T 

trH trace of matrix H 

4'"^ matrix inverse of 

A bar under a symbol is used to indicate a vector, a matrix, or an operator. 

THEORETICAL DEVELOPMENT OF THE REPRESENTATION 

Consider a Hilbert space 1/ which may be the infinite dimensional space <£ 2 of 
square integrable functions or a finite dimensional Euclidean or unitary space. In any 
case, the dimension of o y will be designated as M, although M may be infinite. Further 
consider a set of vectors x n , where n = 1, 2, . . ., N, which may be deterministic or 
random. If the x R are deterministic, it is assumed that they are square integrable 
(summable) such that they are elements of the space . The expectation operator E{/} 
is then defined as multiplication by the real constant 1/N. If the x n are random, they 
are defined on some probability space (fl,©,^->) with points u> in S2 and probability mea- 
sure (p. In this case, the expectation operator is defined as the integral J {.}d(p(u>). 

It is further assumed that the vectors are measurable and mean square integrable (sum- 
mable). Then, almost every sample function of the random process (vector) is an element 
of 

The problem to be considered is the representation of the vectors x n by a constant 
vector e y plus linear combinations of K = M orthonormal vectors ^ e *1/ such 

that the mean squared error i S minimized. In other words, let 

K 

2 n = &0 + \ c nk! k + e n (n = 1, 2, . . ., N) (1) 

k=l 

where the c nk are the coefficients of the expansion and the e n are the error vectors 
associated with the representation. Define the mean squared error in terms of the norms 
of the error vectors; that is, let 



The problem then is to find the optimum set of vectors cb^, the constant vector ^q, and 
the coefficients c n ^ such that E {d%} is minimized. 


The Coefficients 

The error in the representation of the vector is expressed by: 


— n 


K 

-n " ^0 " Z Cnk ^k 
k=l 


(n= 1, 2, . . ., N) 


(3) 


Now the set of all linear combinations of the vectors <jy^, where k = 1, . . K, forms a 
K-dimensional subspace ^ of the vector space ‘Y . From the theory of linear vector 

K 

spaces, the error ||e n ||^ is minimized when ^ c nk^ is the orthogonal projection of 

k=l 

x n - onto the subspace jzf and when coefficients c n ^ are the Fourier coefficients 
defined by the inner product (Berberian, ref. 5, p. 46 and Ficken, ref. 6, p. 303) 


c nk _ 


(±k> in 



/ n = 1, 2, . . ., N\ 
\k = 1, 2, . . ., Kj 


(4) 


of e n simplifies to 


Furthermore, the error vector e n is orthogonal to the 

2 

K I 

Z c nk^k 

k=l 


e 

— n 


in " — 0 


thus the square of the norm 


(5a) 


Since the vectors are orthonormal, equation (5a) can be further simplified to 

K 


e 

— n 


- ||i n -^o|| 2 ' Z | c nkf 


(5b) 


The total squared error now becomes 


^ll|£n|| 2 =|||Sn-io|| 2 -i ||<4* 


( 6 ) 


n=l 


n=l 


n=l k=l 
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The Orthonormal Vectors 


To find the optimum vectors assuming 4 >q fixed, first rearrange equation (6) 
by interchanging the order of summation; that is, 


« 2 = £ ||5„-4k>f- 1 l M 

n=l 


K N 

ll 

k=l n=l 


(7) 


Write the coefficients as N component vectors c k , where k = 1, 2, . . ., K, with com- 
ponents c^. The vectors c k are elements of the vector space Yl of N-tuples. Equa- 
tion (7) can now be rewritten as 

N K 

e 2 = 2, pn-io|l 2 - 2 <£k’ik) (8) 

n=l k=l ' 7 

Define the linear functionals T n on ■g/ onto C by the inner product 

2 n f = (t£ n -io) O) 

where f e and T n f eC- Now the linear transformation T on ^ onto 7]' £71 
is defined by 

T£=(T 1 f,T 2 f,. . T N f) =. 2l - ,(f,2 N -io>) < 10 > 

The dimension of 7}' is M or N', whichever is less, where N' is the number of 
linearly independent vectors among the x n - Then the vectors c^. can be defined by 


-k " — — k 


Combination of equations (8) and (11) produces 


N K 

- 1 ||5„-io|| 2 - I (l£k-Iik> 

n=l k=l 


(ID 


(12a) 
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which can be written as 


02 = 


l 

1 



(12b) 


where T*, the adjoint of the operator T, is on Tl onto V' c ‘V with the same dimen- 
sion as 7?'. The mean squared error is found by taking the expected value of 02 to 
obtain the following: 



(13a) 


E{02} 




where the operator L is defined on 


^ by 


(13b) 


Mk 


= E|T + T^ k 


(14) 


The mean squared error in equation (13b) can now be minimized with respect to the 


by choosing the set of orthonormal vectors _c£ k such that the sum 
maximum. 



is a 


To select the optimum a theorem proven by Jordan in reference 7 concermng 
the extremal properties of eigenvalues will be utilized. This theorem, restated in the 
notation of the present paper, is summarized as follows: 

Let K be a self-adjoint linear operator, v. be orthonormal vectors, and a^ be 
constants such that a^ = a .2 = . . . = aj. Then the sum 


J 
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is maximized with respect to the ja when the ja are the normalized eigenvectors of 
the operator K corresponding to the J largest eigenvalues p.j. Furthermore, this 


maximum value of the sum is equal to 


J 


I 

i=i 


a j^r 


This theorem can be applied to the present problem since the operator L is self- 
adjoint; that is, ^T*T^ = T*T. Thus the mean squared error E [dty is minimized 
when the vectors are the normalized eigenvectors \j/^ of the operator L corre- 
sponding to the K largest eigenvalues A k of L. The representation of the vectors 
x n in equation (1) now becomes 


K 

x n = ^ c nk ^ k + e n (n = 1, 2, . . ., N) 

k=l 


(15) 


and the mean squared error in equation (13b) can be written as 


E {e 2 } = E 




(16) 


A Basis for f/ 

At this point, it is advantageous to pause before proceeding with the optimization of 
the constant vector ^ in order to develop an alternate expression for the total error in 
equation (16). If the null space of the operator is included in the eigenspace as eigenvec- 

! | j 

tors corresponding to the eigenvalue X k = 0, then the normalized eigenvectors ^ of a 
compact, or completely continuous, normal transformation form a complete orthonormal 

i ' 1 | 1 1 

basis for the space (See Berberian, ref. 5, p. 186.) For the moment assume that 
the operator L is compact; this assumption will be verified in appendix B. Under this 
condition, any vector i e ‘V can be expressed in terms of the basis and the Fourier 
coefficients. In equation (16) the vectors x fi - £q may be expanded in terms of the basis 
as follows: 


M 


^n~^0 = Z (ifck’5n-^k 


(n = 1, 2, 


N) 


(17) 
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where M may be infinite, depending on the dimensionality of the space V . Using 
equations (9) and (17), the first sum on the right-hand side of equation (6) can be written 
as 



The orthonormality of the ^ allows equation (18a) to be written as 


I 

n— i 


x 

— n 


N M 
n=l k=l 


(18a) 


(18b) 


Interchanging the order of summation and taking the expectation produces the following 
expression: 



Since the ^ are the normalized eigenvectors of the operator L, equation (19) can be 
expressed in terms of a converging series of the eigenvalues A^ 


f n m 

E< )> ||*„-4o||7= 2 *k 

[n=l j k=l 


( 20 ) 


Substitution of equation (20) into equation (16) yields the following expression for the mean 
squared error: 


M 


K 


M 


E(e 2 >= 2 x k- = 2 


x k 


k=l 


k=l 


k=K+l 


( 21 ) 


The Constant Vector 

Turn now to the problem of finding the optimum constant vector which mini- 

mizes the mean squared error. From equation (21) minimization of the mean squared 
error can be accomplished by minimizing the eigenvalues A^ for k = K + 1, . . ., M. 
Each eigenvalue of the self-adjoint transformation L can be written as follows: 


12 



(k =1,2,. . M) (22) 


E < (l&k* -^k/ ~ <^k’ -^k) " (^k’ x ki£k) ■ x k 


From equation (10), T is an N-dimensional row vector defined by 


-^k " fek’ -1 " ^o)’ • ' ■» <^k’ -N " 


Define the N-dimensional vectors u and v by 


(23) 


- - K^k' Sl)> • ■ - (ik- In)) = ( u l> 



(24a) 


and 

v = k> 1 0 )> • • •’ <^k> ^o)) = ( v o> • • •> v o) ( 24b ) 


Combining equations (23), (24a), and (24b) allows equation (22) to be expressed in terms 
of u and v as follows: 



(25) 


where u n = z n + jy n and vq = Zq + jyp. To find the vector v which minimizes Ak, 
take the partial derivatives of equation (25) with respect to zq and y q and set the 
results equal to zero 


or 



(26a) 


(26b) 


(26c) 
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Substitution for u n and Vq from equations (24) into equation (26c) produces the fol- 
lowing expression: 



for k = K + 1, . . M. Equation (28b) will be satisfied for any 
a minimum if 

where m is a solution of 

A 



The problem of representing the vectors x n with minimum mean squared error 
has now been solved. The optimum representation has been shown to be 


K and the error will be 
(29) 


K 

5n ~ —x + Z Cnki ^k (n = 1, 2, . . N) (31) 

k=l 

where the are the eigenvectors of the operator L, m x is defined by equation (30), 
and the c n ^ are the Fourier coefficients defined by the following equation (32) which 
corresponds to equation (4): 


c nk = (± k , - n - “x) 

14 


/n= 1, 2, . . N\ 
\k = 1, 2, . . Kj 


(32) 



PROPERTIES OF THE REPRESENTATION 


Now, consider some of the properties of the representation of equation (31). By 
hypothesis, the vectors are orthonormal, and it is obvious that the coefficients c nk 
are random variables whenever the vectors x n are random. Some additional properties 
will now be developed. 


Property 1 

For any value of K the representation of equation (31) approximates the vectors 
x n with minimum mean squared error. Property 1 was the basic property of the general 
development and thus has already been proven. 


Property 2 


Property 2 of the representation is that the coefficient vectors £ k are uncorre- 
lated; thus 



(33) 

where 6j k is the Kronecker delta. 


Property 3 


The mean norm squared values of the coefficient vectors £ k are maximized in 
turn beginning with Cj by the representation of equation (31), and this maximum value 
is equal to A k . 


It was demonstrated in equation (33) that E jj|£ k ||^j = A k . The proof that 
expected values E jj|£ k ||^j are maximized in turn will be by induction. First, 


the 



(34) 


By the theorem previously used in determining the optimum set of orthonormal vectors, 
^1, L is maximized when is the eigenvector of L corresponding to the 
largest eigenvalue. Since this condition is true for equation (34), property 3 is true for 

£r 
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Now assume that the mean norm squared values of the are maximized in turn 
for k=l,2,...,K-l by the representation of equation (31). Suppose the mean norm 
squared value of Cjr is maximized by a vector Then 


-K 


- <^K’ — ^k) “ <^K> -^k) 


(35) 


K-l 

Add ^ <^k’ — ^k) eac ^ sic * e e Q ua ti on (35) to obtain 
k=l 


K-l 


K 


(&K> M K ) + £ <^k« ^k) = I <^k> L^k) 


(36) 


But this inequality is contradictory to Jordan's theorem of reference 7. Therefore, 

2 ) 


E 


-K 


is maximized by = j^, which is the representation in equation (31). 

Property 4 

The entropy of the coordinate vectors is minimized by the representation of 
equation (31); that is, when 




(k = 1, 2, . . ., K) 


As in Chien and Fu (ref. 8), define the entropy of the (b-^ as follows: Let 

i2~l 


p k = E 


and 


^k 


K 


(37) 


H (ik) = * I Ok 

v ' k-l 


In p L 


(38) 


From equations (11), (14), and (37) note that 


P k = E [(£k’ 'k)j = <^k" 


(39) 
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and P k = X k when _<£ k = Arrange the p k such that Pj = p 2 = . . . = P K - 
the main theorem with the a k = In p k it can be seen that 


From 


or 


K K 

I ln Pk& k’ ^k> = I ln P k(^k> M k > 

k=l k=l ' 7 


K 


K 


I X k ln ^k = I p k ln Pv 


k=l 


k=l 


Pugachev (ref. 9, p. 156) has shown that 

ln u = 1 - — 
u 

Therefore 

! v * $ * i 4 - s) - ! - ! <*■ i^> - ! ^ ^ ° 

where the first inequality results from equation (41) and the second from the main 
theorem. Equation (42) can be rewritten in the following form: 

K K 

^ x k lnX k= ^ A k ln P k 
k=l k=l 


Combining equations (40) and (43) results in the following equations (44): 
K K 

I A k ln A k = Z P k ln P k 

k=l k=l 


or 


H (jtt) J "(&) 


(40a) 


(40b) 


(41) 


(42) 


(43) 


(44a) 


(44b) 


This property 4 is, of course, true when the set is complete; that is, when K = M. 
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APPLICATIONS OF THE THEORY 


Consider now some applications of the general representation of equation (31) and 
its associated properties. The applications will be divided into two groups: In the first, 
the vectors x n are deterministic and values for the coefficients c n ^ can be computed; 
in the second, the vectors x n are random vectors and only the statistical properties 
(variance and correlation) of the random coefficients can be found. Both finite and infinite 
dimensional spaces will be considered. 

Deterministic or Data Vector Representation 

Case I; Finite dimensional .- Given a set of N real M-dimensional data vectors 
x n , consider the problem of expressing these N vectors in terms of a constant vector 
_<£>q plus linear combinations of K orthonormal basis, or coordinate, vectors _<£ k , where 
K < M = N. Let the criterion for optimizing the representation be minimization of the 
mean squared error. 

In terms of the preceding theoretical development, the current vector space is 
an M-dimensional Euclidean space and the data vectors could be N measurements of an 
M component vector. According to property 1, the representation in equation (31) is 
optimum in terms of mean squared error. Then the constant vector is equal to the 

sample mean, which according to equation (30) is defined for this case by 



or, since the expectation operator corresponds to multiplication by 1/N, 


m 


-x 


N 



n=l 


(45b) 


Furthermore, the coordinate vectors ^ are the first K eigenvectors of the 
operator L, which in the current case is the matrix 

L=-Z T Z (46) 

- N 


18 



where Z is the N x M matrix of row vectors z defined bv 
— — n J 



( 47 ) 



That this is true can be seen by restating the generalized development of L in terms of 
the current M-dimensional Euclidean space £. From equation (10) the operator is 
equivalent to postmultiplication by the transpose of matrix Zj that is, 


= 4 k ! T 


The inner product 


<n k . i± k ) 


becomes in <£ 


(48) 


(li k . lit) - £ k 2 T Z£ k T - <4k, Z*T 


(49) 


Noting that the expectation operator implies multiplication by 1/N, the operator L 
(eq. (14)) is equivalent in £ to matrix postmultiplication by ^ Z^Z as in equa- 
tion (46). Thus, the are solutions of the matrix equation 


^ ^k- T - - X ki^k 


(k = 1, 2, . . K) (50) 


1 T 

Note that if each of the data vectors has mean equal to zero, then the matrix — Z Z is 

N — — 

the sample covariance matrix. 

The coefficients c^, as defined by the inner product in equation (32), can be found 

from 


c nk = i^k — n T 


n = 1, 2, . . ., N| 

k = 1, 2, . . ., K j 


(51) 
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Let C be the N x K matrix of coefficients c^, ^ be the K x M matrix of K 
eigenvectors and be the N x M matrix of row vectors m x< Then the opti- 
mum representation of the data vectors is given in matrix form by 


X * $ 0 + 


( 52 ) 


where 


C = Z V 


(53) 


and the mean squared error eO? 2 ) 1 is given by 


E(e 2 } = tr 


E (2L - - ± Q ~ C*)(X - *o - C£ 


= tr 


E Z - c*yz - C* 


(54) 


Substituting for C and using the orthonormality of the whereby J'jk 1 = £, equa- 
tion (54) reduces to 


E {< 9 2 } = trfl Z Z T j - trfc Z £ T * Z T j 


(55) 


From the theory of matrices, tr^ABj = tr^B Aj. Therefore 


E(e 2 } = tr/1 Z T Zj - trfi *Z T Z* T j 


(56) 


1 T 

Let be the MxM matrix of eigenvectors of — Z^ Tj. Premultiplication and post- 
multiplication by and ■> respectively, constitute a similarity transformation, 

and since similar matrices have equal traces, equation (56) may be written as 


E{0 2 } = tr(I £M Z T Z lM T ) - tr(I *Z T Z* T ) 


N 


(57) 


Application of equation (50) produces the following desired expression for the minimum 
mean squared error: 


M K M 

E 0 2 } = J X * - I = 2 *k 

k=l k=l k=K+l 

as stated in equation (21). 
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Case II: Principal components .- Given an N-size sample (N vectors x^j of an 
M component vector x with mean zero, consider the following problem: Represent 
the vectors x n by linear combinations of orthonormal vectors _<£ k as in equation (1) 
(with ^>q = 0). Select ^ such that the variance of the coefficient vector £^ is maxi- 


mized. 


Select ^ such that the covariance 



is zero (coefficient vectors 


c i and Cg are uncorrelatedj and the variance of £g is maximized. In general, select 
_£ k such that the covariance 


, cAj (i = 1, 2, . . ., k - 1) 

is zero and the variance of £ k is maximized. 

By properties 2 and 3 the solution to the problem is given by equation (31). In this- 
case, the space V is an M-dimensional unitary space. The operator L is defined by 
the M x M sample covariance matrix H, where 



E( X*X ^ =-X*X 
— N 


(58) 


and the vectors £> k = where the £/ k are eigenvectors of H. Furthermore, the 
variance of the coefficient vectors £ k , averaged over the N components c^, is equal 
to the eigenvalue A^. Defining and X as in case I, the average variance a 2 of 
K coefficient vectors is given by 



The percentage of the variance explained, or accounted for, by the first K eigenvectors 
is 


l 

*=1 


A k 



x 100 


k=l 


The representation just described is commonly known as principal component anal- 
ysis, and the coefficients c^ are called the principal components of x fl . (See Rao, 
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ref. 10, p. 501.) This technique, along with other factor analysis methods, has been 
widely used in the field of psychology in analyzing test results, physical characteristics, 
and so forth. The original work in this area was done by Hotelling (ref. 2). 

Note that the principal component representation is the same as that found in case I 
neglecting even though the objectives were different. 

Case III: Feature extraction . - Let x , where n = 1, 2, . . . , N, be a set of real, 


normalized lx n x n = 1) M-aimensional vectors. Find a set of orthonormal coordinates 
qV such that the components of the x n are concentrated on a few coordinates, or, in 
terms of entropy, such that the entropy of the coordinates is minimized. 

In terms of the new coordinates (jy^, the x n can expressed as follows: 


M 

— n = i 


'nk ilk 


k=l 


(n = 1, 2, . . ., N) (60) 


Define the entropy H as 

M 

H(£ m ) = - ^ p k ln p k 
x k=l 


(61) 


where 



(k = 1, 2, . . ., M) (62) 


Now the expansion in equation (60) is the same as equation (1) with = 0 and 
K = M. By property 4, the entropy is minimized when the coordinates are the eigen 

vectors ^ of the covariance matrix _H and the coefficients c^ are the Fourier 
coefficients. 

In matrix notation, let X be the matrix of row vectors x , be the matrix of 
eigenvectors and _C be the matrix of coefficients c^. Then 
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( 63 ) 


H = — X T X 
- N 

The expansion of equation (60) can be rewritten as 

X = C* (64) 


and the coefficients are given by the orthogonal transformation 




c = xi" 1 = x± T 


(65) 


Furthermore, the entropy is given by 
M 

H = - ^ A k In A k (66) 

k=l 


Transformations of this type have been found useful by Watanabe et al. (ref. 4) and 
others in the areas of feature extraction and pattern recognition. 

Case IV: Infinite dimensional .- Consider now a problem similar to case I where 
the N data vectors x n are square integrable functions of a parameter t in the inter- 
val a = t = b. The vector space ^ in this case is the infinite dimensional Hilbert space 
known as space. It is desired to express the x n in terms of orthonormal vectors 

j£ k , as in equation (1), with minimum mean squared error. 

The constant vector ^ is found from 
N 

<67a) 

n=l 

or 

N 

m x (t) = ^ ^ x n (t) (67b) 

n=l 
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Define z(t) as 


z(t) = 


Xf(t) - m x (t) 

x 2 (t) - m x (t) 


x N (f) - m x (t) 

Then with the usual inner product, the operator T n is given by 

r-b 


— n— k = (4k’ — n - ™x) = j a ^k (t ) 


x n (t) - m x (t) 


dt 


and T is given by 


( 68 ) 


(69) 


Z4 k =\ 0 k (t)z*(t)dt 


( 70 ) 


The inner product /T_£ k , T^w becomes 


l± k > T 0 k \ = J J z*(t)z(s)0 k (t)0 k (s)ds dt 

\ / 2l 2l 


(71) 


from which L is defined by 


1 f 


Z4 k =^J z (s)z(t)^ k (s)ds 


(72) 


The operator L is the integral operator with symmetric kernel R(t,s), where R(t,s) 
is the correlation function defined by 


N 


R(t,s)=Iz*(t)z(s)=i £ 


x n (t) - m x (t) 


n=l 


x n (s) - m x (s) 


(73) 
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The basis vectors are normalized solutions of the integral equation 


= * k .£ k 


(k = 1, 2, . . K) 


or 


j* R(s,t)i// k (s)ds = A k i// k (t) 


The coefficients c nk are determined by the inner product 


c nk 






x n (t) - m x (t) 


dt 


and the mean squared error is given by 


n = 1, 2, . . N\ 
k= 1, 2, . . K j 


OO 

E(e 2 >= £ 

k=K+l 


(74a) 


(74b) 


(75) 


(76) 


Case V: Empirical orthogonal functions .- The problem of representing a set of 
data vectors by linear combinations of orthogonal functions has been given considerable 
attention in the field of meteorology, where the vectors x r represent atmospheric tem- 
perature or atmospheric water vapor as a function of altitude or pressure. Numerous 
treatments of the subject appear in the literature. 

In the theoretical development of the orthogonal functions, the literature frequently 
treats the data as a continuous function of pressure; thus the problem is analogous to 
case IV. However, the computational work is done using data measured at the standard 
pressure levels (1000, 850, 700, 500, 300, 200 mb, etc.) in which case the vectors have 
finite dimension as in case I. In any event, the functions are designated "empirical 
orthogonal functions," or occasionally "characteristic patterns," and the objective is to 
reduce the dimensionality of the data by a minimum mean squared error approximation. 
Empirical orthogonal functions have been proposed for use in inverting the data obtained 
via a microwave occultation satellite experiment to obtain atmospheric temperature and 
pressure (ref. 3). 

Example A2 in appendix A is an illustration of the use of empirical orthogonal func- 
tions to represent a set of temperature data. 
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Random Vector Representation 

Case VI: Karhunen-Loeve expansion .- Let x(t) be a zero mean second order 
random process, continuous in quadratic mean, with continuous autocorrelation function 
R(t,s). Let it be desired to represent x(t) on the interval a = t = b by a series 
expansion 


x(t) = ^ c k <*> k (t) 


k=l 


(77) 


such that the coefficients are uncorrelated. 


Let x(t) be written as x(t,cu) to emphasize the fact that the process is a function 
of both t and co, where oj is a point in the probability space In the repre- 

sentation of equation (77), the coefficients c k are functions of cu and the vectors 
are functions of t so that the effects of the variables t and cu are separated. 

Since the process is mean square continuous, almost every sample function, x(t,<u) 

P b 9 

for fixed oj, has finite energy; that is, \ x^(t,cu)dt < °° almost surely. This property 

‘•'a 

is consistent with the assumption at the beginning of the theoretical development that the 
vectors x n are square integrable. It follows that almost every sample function is a 
vector in space, and by property 2 the representation of equation (31) with m x (t) = 0 

is a solution to this problem as posed in equation (77). 


The operator T is defined by the following inner product in JC ^ space: 


1 ± k = <*> k (t)x(t,u>)dt 


(78) 


The inner product /t <p^ , T d>j^) i- n ^ can be expressed as 


(l&k’ l-£k) = ( J ^(tMt^ddlj 0 k (s)x(s,cu)ds 


(79) 


and equation (79) can be rearranged as 

-b nb 


(]L£ k > 1 ^k) = J J x(t,w)x(s,w)0 k (t)<^> k (s)dt ds 

pb pb 

= j <f> k (t)j x(t,w)x(s,cu)0 k (s)ds dt 

3 3 

= < 4 k . 1*1 * k > 


( 80 ) 
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where the inner product ^> k , T*T_£ k ^ is in X 
integrating over all u> in the space (fl,(©,<p) 


2 


space. 


Take the expected value by 


e j^£ k , = y 0 k (t)J x ( t » a, ) x ( s > w ) ( ^k^ s ^ ds dt 

= 4> k (t)J j* x(t,w)x(s,co)d^(u;)0 k (s)ds dt 

3. 3. £2 

= J ^ E jx(t,u>)x(s,u>)J <^ k (s)ds dt 
= J R(t,s)<^ k (s)ds dt 


= <^k> E ^k) 


(81) 


The operator L is then the integral operator with kernel R(t,s), where R(t,s) is the 
correlation function of the random process x(t). The functions <£ k (t) are the eigen- 
functions i// k (t) °f R(t,s); that is, they are solutions of the integral equation 


pb 

J R(t,s)i// k (s)ds = X k i// k (t) (82) 


The coefficients in this case are determined from the integral 


c k = \ ^(t) x (t) dt (83) 

‘'a 

This solution is, of course, the familiar Karhunen-Loeve expansion, and it converges 
uniformly in mean square to the process x(t). According to property 3 this expansion 
maximizes, in turn, the variance of each coefficient (random variable) c k , and this vari- 
ance equals A k . Furthermore, if the expansion is truncated at K terms, by property 1 
the Karhunen-Loeve expansion minimizes the mean squared truncation error, whose value 

oo 

is ^ A k . These latter two properties of the series frequently are not discussed in 
k=K+l 

developments of the expansion in the literature. 

Case VII: Generalized Karhunen-Loeve expansion .- Following the development of 
Chien and Fu (ref. 8), suppose x n (t), where a = t = b, n = 1, 2, . . ., N, are N sto- 
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N 

chastic processes with probability of occurrence p n , where ^ p n = 1. In equation (2) 

n=l 

let the squared error in the representation of each process be weighted according to the 
probability of occurrence of the process. In other words, define the expected value of 
the total squared error as follows: 


E 


{«*> 4 l 


N 

I 

n=l 


r 


21 

Pn E [ 

e n 

J 


(84) 


With this definition of mean squared error, the processes x n can be represented by a 
generalized Karhunen-Loeve expansion as in equation (31), and this representation is 
similar to the standard Karhunen-Loeve expansion in case VI in that both expansions 
exhibit properties 1 to 4. 

In case VII the operator L, is again an integral operator with kernel R(t,s), where 
R(t,s) is a generalized autocorrelation function for the N processes defined by 

N 

R(t,s) = ^ p n E 
n=l 

and the functions $ k (t) are the eigenfunctions ^ k (t) of R(t,s). The function m x (t) 
is the expected value of the x n (t), that is 



x n (t) “ m x W 


"H 


x n (s) - m x (s)Jj> 


(85) 


and the coefficients c nk are determined from the integral 


c nk 


r b 

= ^k(t) 

O 


x n (t) - m x (t) 


dt 


(87) 


Case VIII: Discrete Karhunen-Loeve expansion .- Suppose the zero mean random 
vector x of case VI is finite dimensional. Now the operator L becomes the covari- 
ance matrix E 
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The properties of the 


and the basis vectors are the normalized eigenvectors of E. 
coefficients c^, which are determined by 

c k = <^ k , 2> = (89) 

remain unchanged; that is, they are uncorrelated with variance X k 


E 



" x k 6 jk 


(90) 


If the expansion is truncated at k = K, the expected value of the mean squared error is 
again 

M 

E (e 2 } = £ Xk ( 91 ) 

k=K+l 


CONCLUDING REMARKS 

A comprehensive development of the representation of a set of random or determin- 
istic vectors of any dimension in terms of a set of orthonormal vectors has been pre- 
sented. Four important properties of this representation have been proven. These are: 

(1) the representation approximates the vectors with minimum mean squared error, 

(2) the coefficient vectors are uncorrelated, (3) the norms of the coefficient vectors are 
maximized in turn, and (4) the entropy of the coordinate vectors is minimized. The gen- 
eral representation has been specificized to several applications including the familiar 
Karhunen-Loeve expansion and the use of empirical orthogonal functions to reduce the 
dimensionality of atmospheric temperature profiles. For further clarification some 
illustrative numerical examples are included in an appendix. 

Langley Research Center, 

National Aeronautics and Space Administration, 

Hampton, Va., January 16, 1973. 


29 



APPENDIX A 


SOME ILLUSTRATIVE EXAMPLES 


Example A1 

Consider the set of five data points with Cartesian coordinates x, y, and z as 
tabulated below and illustrated in figure Al. 


Point 

X 

y 

z 

1 

-1 

-2 

5 

2 

-2 

0 

3 

3 

-2 

3 

3 

4 

2 

3 

0 

5 

3 

0 

1.6 


Find the plane upon which the data can be projected with minimum mean squared error. 

In terms of linear vector spaces the data are vectors v n in a three-dimensional 
Euclidean space. The problem is to determine the constant vector _<£ Q and orthonormal 
coordinate vectors ^ and (p^ which define the two-dimensional hyperplane (or flat) 
upon which the data vectors can be projected with minimum error. 

The constant vector ^ is the average m v of the vectors v n 
5 

Hv =3 £ X n = (0 ’ °- 8 ’ 2 - 52 > (Al) 

n=l 


The operator L is the 3x3 sample covariance matrix E 




22.0 

2.0 

2.0 

18.8 

-12.2 

-11.1 


- 12.2 

- 11.1 

13.8 


(A2) 


where U is the 5x3 
are 

Xl = 7.06 
A 2 = 3 . 65 ) 
A3 = 0.22 


matrix of row vectors u n = v n - m y . The eigenvalues of H 


(A3) 
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APPENDIX A - Continued 


The coordinate vectors ^ and ^ are the eigenvectors and correspond- 
ing to the eigenvalues Xj and A 2 


= (0.630, 0.484, -0.607) 

± 2 = (-0.654, 0.752, -0.079) ( 


(A4) 


Note that the new coordinate vectors 
error) since 


<^i> j£l) = 1-000 

<^2’ ^2) = 0-99 9 ) 
U v j£ 2 \ = O' 000 


and ^ are orthonormal (within round-off 


(A5) 


The hyperplane then is the plane through the point m v containing the vectors and 

translated by m , as illustrated in figure Al. 

The coefficients, or coordinates, c nk are found from 


c„ k = <i k ,v„-m v )= tk u„T (A6) 

The mean squared error e{ 0^} produced by representing the data as points on the 
plane is given by 


E 


n — 1 



0.217 


(A7) 


where the c n 3 is the coordinate of u n along \p^- 

Note that the average variance of the projected data is 


5 2 




c nk) = A 1 + a 2 


P 

n=l k=l 

The average variance of the data is 

^ = ‘4u T u)=I l l cj 

\ > n=l k=l 



(A8) 


(A9) 
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APPENDIX A - Continued 


The percentage of the variance explained by the projected data 

g p 2 _ + X 2 

a 2 M + ^-2 + X 3 

is 98 percent. 


(A10) 


Example A2 

Consider the set of atmospheric temperature profiles obtained by radiosonde at 
Charleston, S.C., during January 1 to 15, 1966. The set consists of the temperatures 
taken at 0000 GMT and 1200 GMT at the nine standard pressure levels of 850, 700, 500, 
400, 300, 250, 200, 150, and 100 mb for a total of 30 profiles, or data vectors t , where 
n = 1, 2, . . ., 30. 

It is desired to reduce the dimensionality of the data set by approximating the pro- 
files by a linear combination of K < 9 orthonormal vectors as follows: 

K 

t n = ^0 + I c nk4 k (n = 1, 2, . . ., 30) (All) 

k=l 


Find the set of vectors _<£ k , or empirical orthogonal functions, that best approximate the 
data in terms of mean squared error. The vector space 2/ in this instance is nine 
dimensional, with each vector having as its components the temperatures at the nine pres- 
sure levels. From equations (29) and (30) the optimum is the average temperature 
profile m t found as follows: 

30 

m, = — Y t (A12) 

-t 30 /j ' n 
n=l 


The temperature deviations z n from the mean were then computed 

£ n = t n '— t (n = 1, 2, . . ., 30) (A13) 

The 9x9 sample covariance matrix H (the operator L for this problem) was cal- 
culated from 



where Z is the 30 x 9 


matrix of temperature deviations z n . 


(A14) 
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APPENDIX A - Continued 


The empirical orthogonal functions are the eigenvectors of E correspond- 
ing to the K largest eigenvalues. The eigenvalues are tabulated in table Al, and the 
first three eigenvectors are plotted in figure A2. The percentage of the variance 
explained by K eigenvectors a p ^/ and the root-mean-square temperature error 
t rm s were computed using the following equations (A15) and (A16), respectively: 



The results are tabulated in table Al. 


(A15) 


(A16) 


TABLE Al. - RESULTS OF TEMPERATURE-PROFILE REPRESENTATION 


K 

X k’ 

°c 2 

2/2 

a p/ a ’ 
percent 

*rms> 

°C 

1 

44.65 

67.7 

2.23 

2 

7.62 

78.8 

.92 

3 

4.40 

85.1 

.70 

4 

3.61 

90.4 

.63 


From table Al it can be seen that for this set of data one empirical orthogonal 
function accounts for two-thirds of the temperature variance and that three functions can 
approximate the temperature with a root-mean-square error of 0.7° C. 


Example A3 

Consider the random process shown in figure A3, where £ is a random variable 
with uniform distribution. The probability density function for £ is then 


f c (D 




(elsewhere) 


(A 17) 
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APPENDIX A - Continued 


Represent the process x(t) 


in the interval 


_T T 
2 ’ 2 


by 


a series of the form 


c(t) = y c k ^ k (t) 


k=l 


(A18) 


where the coefficients c k are uncorrelated. 

From property 2 the coefficients c k are uncorrelated when the functions i// k (t) 
are the eigenfunctions of the integral operator L whose kernel is the correlation func- 
tion R(t,s) of the process x(t). The correlation function is triangular function with 
period T, as shown in figure A4. 



Figure Ab.- Correlation function of x(t). 


The eigenfunctions \|/^(t) are solutions of the following integral equation (from 

eq. ( 82 )): 



(A19) 


The infinite set of sines and cosines, sin cu^t and cos o> k t, are solutions to the integral 
equation (A19). These functions are substituted into equation (A19) to determine the 
eigenvalues A k : 


pT/2 

sin w k s 


sin u> k t 

R(s,t) 


ds = A k 


-T/2 

cos w k s 


cos o> k t 


Evaluation of the integral in equation (A20) is somewhat tedious since the interval of 
integration must be divided into several subintervals, because the analytic expression for 
the correlation function is different for different areas in the s - t plane. This is 
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to|H 


APPENDIX A - Continued 


illustrated in figure A5. The area of integration is the square corresponding to 



Figure A5.- Autocorrelation function in the s - t plane. 


Evaluation of the integral for cos o> k t yields the identity 


w k T 2 w k 2 T 


1 - cos 


w k T 

+ sin — — - sin w k t = A_ k cos u> k t (A21) 

Li 

This identity will be satisfied if 

(A22a) 

(k = 1, 2, . . .) (A22b) 
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APPENDIX A - Continued 
Then the eigenvalues A k are determined by 


N 8A 2 f, _ w k T 

X k = — 2 - ! 1 - cos 2~ i 
u^T \ Z ) 


or 


4A 2 T 


X k = l k 2 7T 2 


The normalized eigenfunctions then are 


(k = 1, 2, . . .) 


(k= 1, 3, . . .) 
(k = 2, 4, . . .) 


r 





(k = 1, 2, . . .) 


where i// k (t), for k even, are members of the null set of the operator L. 
Now the process x(t) can be represented by the series 



k, odd 

where the random coefficients are determined by the stochastic integrals 


and 


a k 



x(t)cos 


k27Tt 

T 


dt 



x(t)sin 


k27Tt 

T 


dt 


(A23a) 


(A23b) 


(A24) 


(A25) 


(A26a) 


(A26b) 
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APPENDIX A - Concluded 


Evaluation of these integrals yields 


and 




2A\|2T 

— sin 

k7T 


k2? r? 

T 


1 ° 


2A\j2T k27r? 

— cos 

/ k7T T 

0 


(k= 1, 3, . . .) 
(k = 2, 4, . . .) 

(k= 1, 3, . . .) 
(k = 2, 4, . . .) 




(k =1,3,. . .) 
(k = 2, 4, . . .) 


(A27a) 


(A27b) 


(A28a) 


(A28b) 


which agrees with in equations (A23b). The variance of b^ is the same as that of 
a k . The total power P^ in the kth harmonic is 


or 



(k = 1, 2, . . .) (A29a) 


(k= 1, 3, . . .) 
(k = 2,4,. . .) 


(A29b) 


which agrees with the results obtained by conventional Fourier series analysis of a 
square wave. 
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APPENDIX B 


PROOF THAT THE OPERATOR L IS COMPACT 


The linear self-adjoint operator L on the vector space V was defined by equa- 
tion (14) as follows: 



The operator T is defined by equation (10) as follows: 



If the space is a finite dimensional Euclidean or unitary space, the operator L 
is finite (has a finite dimensional range) and is thus compact. (See ref. 11, p. 37.) If the 
space y is / 9 space and the vectors x are deterministic, the operator L is 
again finite and compact. 

The remaining case of interest is that when the vector x is a random process with 
sample functions in X 2 space. Assume the process x is measurable on (Ixt and 

f b 

mean square integrable; that is, \ 

J a 

Lf = E^f,^zj (Bl) 

where z = x - ^ . In equation (Bl) becomes 


E(x 


: 2 (t) 


dt < °°. The operator L is defined by 


L f 


’ pb 

\ f(t)z(t,ce)dt z(s,w)d (P(oo) 
Q J a 


(B2) 


By Fubini's theorem on iterated integrals (ref. 12, p. 135) 

Lf = J f(t)J z(s,cu)z(t,w)d(P(w)dt = J f(t)E |z(s)z(t)j dt = J R(s,t)f(t)dt (B3) 

Since the process is mean square integrable the kernel R(s,t) is square integrable on 
the square interval [a, bj x [a, bj. Thus the operator L is compact because every oper- 
ator with square integrable kernel is compact (ref. 11, p. 47). The proof is complete. 
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